Artificial intelligence in differentiating tropical infections: A step ahead

Background and objective Differentiating tropical infections are difficult due to its homogenous nature of clinical and laboratorial presentations among them. Sophisticated differential tests and prediction tools are better ways to tackle this issue. Here, we aimed to develop a clinician assisted decision making tool to differentiate the common tropical infections. Methodology A cross sectional study through 9 item self-administered questionnaire were performed to understand the need of developing a decision making tool and its parameters. The most significant differential parameters among the identified infections were measured through a retrospective study and decision tree was developed. Based on the parameters identified, a multinomial logistic regression model and a machine learning model were developed which could better differentiate the infection. Results A total of 40 physicians involved in the management of tropical infections were included for need analysis. Dengue, malaria, leptospirosis and scrub typhus were the common tropical infections in our settings. Sodium, total bilirubin, albumin, lymphocytes and platelets were the laboratory parameters; and abdominal pain, arthralgia, myalgia and urine output were the clinical presentation identified as better predictors. In multinomial logistic regression analysis with dengue as a reference revealed a predictability of 60.7%, 62.5% and 66% for dengue, malaria and leptospirosis, respectively, whereas, scrub typhus showed only 38% of predictability. The multi classification machine learning model observed to have an overall predictability of 55–60%, whereas a binary classification machine learning algorithms showed an average of 79–84% for one vs other and 69–88% for one vs one disease category. Conclusion This is a first of its kind study where both statistical and machine learning approaches were explored simultaneously for differentiating tropical infections. Machine learning techniques in healthcare sectors will aid in early detection and better patient care.


Conclusion
This is a first of its kind study where both statistical and machine learning approaches were explored simultaneously for differentiating tropical infections. Machine learning techniques in healthcare sectors will aid in early detection and better patient care.

Author summary
Distinguishing tropical infections is difficult due to its homogeneous nature from clinical and laboratory presentations among them. This is a first of its kind study where both statistical and machine learning approaches were explored simultaneously for differentiating tropical infections. Dengue, malaria, leptospirosis and scrub typhus were the common tropical infections in our settings as per the need analysis. Better predictors in terms of laboratory parameters and clinical presentations were identified from retrospective analysis and used for the regression and machine learning models. The parameters such as accuracy, true positive rate/sensitivity/recall, false positive rate, precision/positive predictive value, F-measure and ROC area for both the training and validation sets (10-fold cross validation) for all modelling approaches and diseases (One vs One and One vs others) were calculated. All the models observed to have an acceptable range of model performance in differentiating tropical infections. Albumin can be considered as the main parameter in differentiating these tropical infections. These models should be implemented in daily clinical routine practice via mobile or desktop assisted applications or tools.

Introduction
Tropical infectious diseases influence the health of people in tropical climatic conditions thriving in hot and humid areas. It is more common in developing countries presenting as dengue, malaria, leptospirosis, leishmaniasis, scrub typhus, and rickettsial fever [1,2]. They are the major cause of acute febrile illnesses, a major concern of health to people in endemic regions [3]. Globally, tropical infectious diseases lead to melancholy of more than one billion people's lives with one-half of deaths annually. Children are more prone to tropical infections and every year, one-third of the infected children die due to malaria [4,5].
Biological factors like residential status (rural vs. urban), living conditions, climatic conditions, nutritional status, environmental factors, poverty, and population density are some of the contributory risk factors [6,7]. Tropical infections are a serious threat to public health, especially in industrialized countries. Dengue is one among the 17 diseases of WHO's roadmap categorized under neglected tropical diseases which could otherwise be effectively controlled. Almost 2.5 billion people are at risk of contracting the infection and complications could be prevented only by early diagnosis and appropriate treatment on time [8,9]. Saumyen De et al. elucidated that most tropical illnesses such as leptospirosis, malaria, scrub typhus, and endemic fever are often confused with dengue fever, especially in a country like India [10]. Malaria was once a major threat globally with tough challenges especially in underdeveloped countries, though the rate of new cases has declined globally by 37% as of 2000-2015. As reported by the researchers of Yale school of public health, about one million people in underdeveloped regions contract leptospirosis with nearly 59,000 deaths especially in Latin America, Asia, Africa, and island nations [11].
The common reason for the uncalled emergence of tropical infections could be due to similar laboratory values, overlapping symptomatology, early asymptomatic presentation, misunderstanding, and delayed diagnosis, which adds up to complications [12]. Predicting its outbreak is a challenging issue due to non-specific clinical presentations of these tropical infectious diseases [13]. The gold standard of diagnosis at an early stage involves specific serological investigations. In the case of common seasonal patterns and overlapping symptoms, a delay in differential diagnosis makes the situation complicated and worsened. Usually, the specific differential test for malaria and dengue takes 24 hours and >24-48 hours for rickettsial fever-like leptospirosis and scrub typhus [14,15].
Detailed history taking, clinical examination, and evaluation of laboratory parameters from the beginning lead to accurate end diagnosis with serological values, though this process is tedious and burdensome to healthcare professionals [16]. With progression in this technological world, machine learning which is a sub-discipline of artificial intelligence (AI) has been playing a significant role in the advancement of healthcare technology [17]. AI has revolutionized the dynamic healthcare industry through its application in various domains like drug discovery, drug repurposing, insilico clinical trials, epidemic outbreak prediction, precision health, diagnosis, prognosis and prediction [18]. It is a fact that, not only lack of appropriate management strategies contributes to high mortality, but also challenges in early detection, risk stratification and severity prediction contributes to it [19].
Thus, there is a need for developing a tool that makes optimal use of distinct symptomatology amongst these infections and distinguishable laboratory parameters for early identification of cause of acute febrile illnesses. Early diagnosis could in addition lead to appropriate antibiotic therapy thus reducing antibiotic resistance along with a better patient care and lesser morbidity or mortality rates in clinical setting [20]. Thus our study aimed to develop a decision making tool to distinguish among tropical infections such as dengue, malaria, leptospirosis and scrub typhus in a tertiary care hospital which is user friendly for early prediction in order to minimize complications due to delayed diagnosis.

Ethical approval
An observational study was conducted at a South Indian tertiary care centre for a period of 9 months from July 2017-March 2018 after obtaining the Institutional Ethics Committee approval (IEC No: 509/2017). A written informed consent was obtained from the participants who are involved in the need analysis (phase I). The data for prediction tool development (phase II) were collected from medical record department retrospectively, hence informed consent was not applicable.

Study design and setting
This study includes two phases; I) a need analysis phase and a prediction tool development phase. A nine item self-administered questionnaire was developed, validated, distributed and analysed to estimate the need of physicians on differentiating tropical diseases in their setting and components of the same. II) Based on this results a prediction tool was developed with decision tree involving multi-nominal regression analysis and machine learning algorithm. A study flow diagram is given in Fig 1.

Need analysis 2.3.1 Questionnaire development.
A literature review assisted nine item self-administered questionnaire was developed to estimate the need of a tool for physicians to differentiate tropical infectious diseases in the hospital. The questionnaire was divided into two parts. The first part was disease specific consisting of 6 questions i.e., frequency of tropical infectious cases, number of cases treated in a week, common observed tropical infections, obstacles in treating tropical infections, challenges in management of infections and the need of tool development. The second part with respect to tool development consisted of 3 questions regarding parameters to be included, format of the tool and additional suggestions.

Questionnaire validation.
The content validity of the questionnaire was performed to check its appropriateness and relevance to the study. Six experts including 2 physicians and 4 pharmacy academicians from the study setting validated the questionnaire. Content validity index (CVI) consisted of validation of individual questions and overall scale validity of questionnaire was calculated once all experts completed the validation process; and final validated questionnaire was prepared.

Dissemination of questionnaire and analysis.
Validated questionnaire was distributed to physicians or medical students (residents and post graduates) who were involved in treating tropical infections and acute febrile illnesses to assess the need of developing and implementing a decision making tool. Physicians or students who were not currently affiliated in the department of medicine or not treating tropical infections at the hospital were excluded from the study. The need for tool development was explained to them in a one-to-one basis. Relevant data from physician's feedback on questionnaire was collected which include specific tropical infectious diseases and parameters to be included in decision making tool. The descriptive data was analysed and presented as frequency and percentage.

Prediction tool development 2.4.1 Data collection and statistical analysis.
A validated case report form (CRF) was designed based on physician's feedback in which patient's data was collected retrospectively. Data was analysed through Statistical package for social sciences (SPSS Inc., version 20.0). The descriptive data were presented as frequency and percentage and continuous variables were represented by mean (SD) or median (range). Only those parameters which were statistically significant (two-sided P�0.05) with high odds ratio (OR) through multi-nominal logistic regression from clinical and laboratorial parameters were considered for tool development.

Development of the data sets and decision making tool
We aimed to generate a simple scoring system which could differentiate among common tropical infections (dengue, leptospirosis, malaria and scrub typhus) through basic clinical and laboratory results available within few hours of admission in hospitals by which a simple decision tree model could be generated. A total of 800 patients with 200 in each group (dengue, malaria, leptospirosis and scrub typhus) were collected and analysed accordingly. Thirteen variables were selected in order to differentiate among tropical infections. Multinomial logistic regression analysis was performed by dividing the data into two categories i.e., one containing 170x4 cases (training set) and other containing 30x4 cases (validation set with dengue fever as reference category).

Machine learning algorithm
Waikato Environment for Knowledge Analysis (WEKA) software was used for machine learning modelling. Binary classification analysis as well as multi-classification analysis was used for machine learning. Under binary classification two strategies were adopted. Firstly, one vs rest strategy (other) where a single classifier per class is trained with sample of that class as positive and all other samples as negative; Secondly, by taking two diseases at a time i.e., one vs one strategy. Multi-classification involved several algorithms which were based on neural networks, decision tree, random forest, and multinomial regression. The above said models were applied to the same dataset to obtain results accordingly.

Result
A 9 item validated self-administered questionnaire was designed to determine the demand of physicians on decision making tool in order to differentiate among tropical infections.

Need analysis
3.1.1 Validation of the questionnaire. All 9 items in our questionnaire was found to be relevant and appropriate to our study with a CVI of greater than 0.78. The overall scale CVI (S-CVI) of our questionnaire was appeared to be 0.94.
3.1.2 Dissemination of questionnaire. The validated questionnaire was circulated among 40 physicians and post graduate students in department of medicine which consisted of head of the department (3), associate professor (3), assistant professor (4) senior residents (1) junior residents (14), post graduates (6) and interns (9).

Perspectives of physicians.
All the 40 physicians claimed that they often treat tropical infections with an average of 24 cases weekly. Among the diseases observed, dengue (n = 16) was common, followed by scrub typhus (n = 15), malaria (n = 14), leptospirosis (n = 12) and influenza (n = 4). Majority of them (n = 24) faced challenges in treating tropical infections, most of them (n = 34) felt management of symptomatology to be difficult, followed by diagnosis (n = 30). The majority of the participants (n = 35) felt the need for development of decision making tool. Most of them agreed on including laboratorial parameters (n = 34) and clinical presentations (n = 35) as main criterion in the tool and preferred to be in an online app format (n = 36). Based on the perspectives of treating physician, a validated CRF was used to collect the patient data.

Demographic characteristics of patients.
A total of 800 patients were included with a male domination (68.5%), out of which 9.5% were geriatric population. The mean age of the study population was 39.1±14.9 years. Geographical spreading showed higher rates of leptospirosis and scrub typhus from north side of the state, whereas dengue and malaria rates were higher in south side. The demographic details and clinical parameters of each disease is given in Table 1.
Based on the clinical presentation of all four tropical infections, a multinomial regression analysis was performed to identify the significant factors in predicting score of each disease.

Multinomial logistic regression analysis for significant parameters among scrub typhus, malaria, and leptospirosis with dengue disease category as reference
On the basis of various characteristics, a decision tree model was developed. According to the tree, albumin could be considered as the main parameter to differentiate among 4 tropical infections followed by platelets and bilirubin levels (p<0.05). The decision tree model is depicted in Fig 2.

Machine learning modelling
WEKA machine learning tool was applied to test binary (one disease at a time) and multi-class (all the four diseases) classification. Multi-class classification based on the random forest (multiple decision trees), neural network (back propagation), decision tree and adaboost (logistic regression base class) showed an average accuracy of 55-60%. Binary classification (one vs. rest) using logistic regression showed an estimated accuracy of 79-84%. Binary allocation with logistic regression by sampling two diseases at a time (one vs. one) showed an accuracy of 69-88%. Attribute visualization showed value overlapping among diseases with no attributes helping in classifying diseases. Correct classified instances by decision tree were 50.37%, random forest (56.62%), multinomial logistic regression (59.75%), adaboost (59.75%) and multilayer perception (55.88%). Binary classification of dengue vs. others showed exact classified  Table 3. Attribute visualization for each disease and the model classifications through machine learning modelling is presented in S1 File. The parameters such as true positive rate/sensitivity/recall, false positive rate, precision/positive predictive value, F-measure and receiver operating characteristic (ROC) area for both the training and validation sets (10-fold cross validation) for all modelling approaches and diseases (One vs One and One vs others) also calculated. The data visualization and parameters with respect to the disease variable is provided in S2 File.

Effect of age on model performance
To study the effect of age on model performance, the data set were divided into four groups, with equi-frequency such as 18-26.5; 26.5-36.5; 36.5-50.5; and 50.5-80. The classifiers such as random forest, multinomial logistic regression and multi-layer perceptron were applied. The accuracy of the model based on age classification was 67 to 71%. We did not observe any changes in the accuracy as well as other parameters such as true positive rate/sensitivity/recall, false positive rate, precision/positive predictive value, F-measure and ROC area of the model based on the age. The results are provided in S3 File.

Discussion
Tropical infections such as dengue, malaria, scrub typhus, leptospirosis, rickettsial fever, and leishmaniasis are the prevailing reasons for febrile illness now-a-days especially in endemic region [21]. Similar laboratory parameter for evaluation of these infections and extending symptomatology makes it difficult to be diagnosed at starting stage. Dengue is considered as the cause of febrile illness, even in case of scrub typhus and rickettsial fever. Similarly in southeast Asian region, malaria and leptospirosis are the other common cause of febrile illness. Mostly, these diseases present with non-specific clinical characters which makes it difficult to differentiate among each other [22]. Hence, development of a tool which could differentiate within these diseases based on the initial presentation is beneficial for physicians for early diagnosis and effective management. In India especially many parts of southern India, the most common and recurrent presumptive diagnosis among those hospitalized with undifferentiated fever pattern is dengue, malaria, leptospirosis and scrub typhus. Many attempts have been taken by researchers in predicting among these illnesses [23,24]. Mitra et al., elaborated on differentiating clinical and laboratory parameters in patients with scrub typhus and dengue. They included age (>30 and �30 years), haemoglobin (�14 and >14 g/dL), total white blood cells count (<4000, 4001-7000 and >7000 cells/cumm), oxygen saturation (>90%, �90%), total bilirubin (�2 and >2 mg/dL) altered sensorium (present or absent) and serum glutamic-oxaloacetic transaminase (�200 IU/dL) and >200) which were all significant in predicting differences among these two illnesses [25]. Varma et al., compared between dengue and leptospirosis and reported muscle tenderness, leukocytopenia, elevated erythrocyte sedimentation rate, oliguria, acute renal failure, icterus, anaemia, thrombocytopenia and hypoalbuminaemia to be common in leptospirosis compared to dengue [26]. Mortality ratio of leptospirosis to dengue was 18:1. However, no model has been developed yet based on multinomial logistic regression analysis or machine learning modelling which better differentiates among dengue, malaria, scrub typhus and leptospirosis, though there were attempts to find out differentiating features among these diseases. In our study, we considered dengue, malaria, scrub typhus and leptospirosis as it is the most prevailing tropical infection in our setting with similar clinical presentation which makes them difficult to differentiate at early stage. Along with this, the inference from our pilot survey among medical professional who deal with tropical infection helped us to go ahead with the above infections. We retrospectively collected laboratory parameters of 800 (200 in each group) patients and those factors with highest OR (p<0.05) were considered for the model development. These factors includes four clinical presentations namely i) abdominal pain (present/absent), ii) arthralgia (present/absent), iii) myalgia (present/absent), iv) urine output (decreased/normal); and five laboratory parameters namely v) sodium level (100-140 ml, 140ml and above), vi) total bilirubin level (0-1.6, 1.6-3.2 and �3.2mg/dL), vii) albumin (0-3.4, �3.5 mg/dL), viii) lymphocytes (10-20, 21-40, �40 cells/cumm) and ix) platelets (5000-50000, 50000-100000, 100000-150000 and 150000-450000 cells/cumm).
Multinomial logistic regression and decision tree analysis inferred that albumin could be considered as the main parameter to differentiate among 4 tropical infections followed by platelets and bilirubin levels (p<0.05). These findings were in accordance with the models developed by Mitra et al., and Varma et al., [25,26]. On the basis of multi-nominal logistic regression, dengue showed 60.7%, leptospirosis showed 66%, malaria showed 62% and scrub typhus showed only 38% predictability, respectively. Recent studies expanded the machine learning applications to the all aspects of medicine and associated fields. Davi C et al., [27] used genome markers to identify individuals at high risk for developing the severe dengue phenotype even in uninfected conditions. Another study by McLaughlin M et al., [28] demonstrated using the malaria rapid diagnostic test "truth" data along with digital mHealth platform clinical assessments and clinical data for a better identification of children with malaria among those with febrile illness.Binary classification machine learning models i.e., one vs. rest showed an average predictability of 79-84% while one disease vs. another showed a score of 69-88%. Multi-classification models such as neural network, decision tree, multinomial regression and random forest plot models showed a predictability score of 55-60%. This could be due to overlapping symptoms and lower accurate arrangement of laboratory and other parameters. In comparison with other studies, our study intended for a dual approach i.e., to establish a model based on machine learning and multinomial logistic regression analysis. In addition we included 4 different tropical diseases along with laboratory and clinical parameters to minimize the chances of misclassification bias. The sample size of our study was also relatively high. In addition to this, the effect of age on model performance was analysed to avoid the confounding effect of age and we did not find any change in model performance based on the age [29,30,31].
Future studies should be conducted to provide more insights towards the application of this study through developing artificial intelligence or computer assisted technology in daily clinical practice. This will help to analyse the credibility of our findings and upgrade further based on the clinical scenario. Also, future studies could focus on a broad aspects of the disease other than parameters considered in our study based on a better knowledge on geographical distribution.
Limitations of our study included the retrospective data collection due to which clinical parameters during initial visits at emergency department or clinic (at outpatient setting) could not be recorded. The findings from this single centred data cannot be generalized to other part of the world as the nature and presentation of tropical diseases varies from location to location. As we used WEKA software for machine learning, it doesn't provide the parameters such as true negative, false negative and specificity. However, the parameters like accuracy, true positive rate/sensitivity/recall, false positive rate, precision/positive predictive value, F-measure and ROC area was calculated.

Conclusion
Technology integrated healthcare helps much towards early diagnosis potentiating better quality of life. There is a strong need for physicians in the development of a tool which differentiates tropical infections. Early diagnosis of tropical infections helps in further improving patient care. Our study is the first of its kind where both machine learning and statistical techniques were applied to develop a model in tropical infectious diseases which need to be studied further in implementation level.